Supervised learning using mahalanobis distance for record linkage

ثبت نشده
چکیده

In data privacy, record linkage is a well known technique used to evaluate the disclosure risk of protected data. Mainly, the idea is the linkage between records of different databases, which make reference to the same individuals. In this paper we introduce a new parametrized variation of record linkage relying on the Mahalanobis distance, and a supervised learning method to determine the optimum simulated covariance matrix for the linkage process. We evaluate and compare our proposal with other studied parametrized and not parametrized variations of record linkage, such as weighted mean or the Choquet integral, which determines the optimal fuzzy measure. URL http://agop2011.ciselab.org/proceedings [13] Source URL: https://www.iiia.csic.es/en/node/54955 Links [1] https://www.iiia.csic.es/en/staff/daniel-abril [2] https://www.iiia.csic.es/en/staff/guillermo-navarro-arribas [3] https://www.iiia.csic.es/en/staff/vicen%C3%A7-torra [4] https://www.iiia.csic.es/en/staff/bernard-de-baets [5] https://www.iiia.csic.es/en/bibliography?f[author]=1996 [6] https://www.iiia.csic.es/en/bibliography?f[author]=1997 [7] https://www.iiia.csic.es/en/bibliography?f[keyword]=936 [8] https://www.iiia.csic.es/en/bibliography?f[keyword]=497 [9] https://www.iiia.csic.es/en/bibliography?f[keyword]=465 [10] https://www.iiia.csic.es/en/bibliography?f[keyword]=935 [11] https://www.iiia.csic.es/en/bibliography?f[keyword]=934 [12] https://www.iiia.csic.es/en/bibliography?f[keyword]=470 [13] http://agop2011.ciselab.org/proceedings

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supervised learning approach for distance based record linkage as disclosure risk evaluation

In data privacy, record linkage is a well known technique to evaluate the disclosure risk of protected data. It is used to evaluate the number of linked records between a data set and its protected version. In this paper we give an overview of the work that we have been doing during the last months. We describe the development of a supervised learning method for distance-based record linkage, w...

متن کامل

Supervised learning using mahalanobis distance for record linkage

In data privacy, record linkage is a well known technique used to evaluate the disclosure risk of protected data. Mainly, the idea is the linkage between records of different databases, which make reference to the same individuals. In this paper we introduce a new parametrized variation of record linkage relying on the Mahalanobis distance, and a supervised learning method to determine the opti...

متن کامل

Using Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment

Distance-based record linkage (DBRL) is a common approach to empirically assessing the disclosure risk in SDC-protected microdata. Usually, the Euclidean distance is used. In this paper, we explore the potential advantages of using the Mahalanobis distance for DBRL. We illustrate our point for partially synthetic microdata and show that, in some cases, Mahalanobis DBRL can yield a very high re-...

متن کامل

Learnable Similarity Functions and Their Applications to Record Linkage and Clustering

Many machine learning tasks require similarity functions that estimate likeness between observations. Similarity computations are particularly important for clustering and record linkage algorithms that depend on accurate estimates of the distance between datapoints. However, standard measures such as string edit distance and Euclidean distance often fail to capture an appropriate notion of sim...

متن کامل

Choquet integral for record linkage

Record linkage is used in data privacy to evaluate the disclosure risk of protected data. It models potential attacks, where an intruder attempts to link records from the protected data to the original data. In this paper we introduce a novel distance based record linkage, which uses the Choquet integral to compute the distance between records. We use a fuzzy measure to weight each subset of va...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017